Open User Involvement in Data Cleaning for Data Warehouse Quality
نویسندگان
چکیده
High quality of data warehouse is a key to make smart strategic decisions. The data cleaning is program that performs to deal with the quality problems of data extracted from operational sources before their loading into data warehouse. As the data cleaning can introduce errors and some data require manually clean, there is a need for an open user involvement in data cleaning for data warehouse quality. This is essential to validate the cleaned data by users and to replace the dirty data in their original sources, and also to correct the poor data that can’t be cleaned automatically. In this paper, we extend the data cleaning and extract-transform-load (ETL) processes to better support the user involvement in data quality management. We proposed that the ETL processes include two phases: the transformation to clean data at the operational data sources and the propagation of data cleaned towards their original sources. The major benefits of our proposal are twofold. First, it is the validation of cleaned data by users. Second, it allows the operational data sources quality improvement. Consequently the user involvement based data cleaning leads to a total data quality management and avoids redoing the same clean for future warehousing.
منابع مشابه
Enactment of Medium and Small Scale Enterprise ETL(MaSSEETL)-an Open Source Tool
Data quality is major concern area in an Data Warehouse environment. ETL tools focus on detection and correction of data quality problems that affect the success of a data warehouse. Data imported from source into the data warehouse often has different quality, format, coding etc. In order to bring all the data together in a standard, homogeneous environment, Extraction–transformation– loading ...
متن کاملA Unified Framework and Sequential Data Cleaning Approach for a Data Warehouse
The data cleaning is the process of identifying and removing the errors in the data warehouse. Data cleaning is very important in data mining process. Most of the organizations are in the need of quality data. The quality of the data needs to be improved in the data warehouse before the mining process. The framework available for data cleaning offers the fundamental services for data cleaning s...
متن کاملSupport for User Involvement in Data Cleaning
Data cleaning and ETL processes are usually modeled as graphs of data transformations. The involvement of the users responsible for executing these graphs over real data is important to tune data transformations and to manually correct data items that cannot be treated automatically. In this paper, in order to better support the user involvement in data cleaning processes, we equip a data clean...
متن کاملA Review of Contemporary Data Quality Issues in Data Warehouse ETL Environment
In today’s scenario, extraction–transformation– loading (eTl) tools have become important pieces of software responsible for integrating heterogeneous information from several sources. The task of carrying out the eTl process is potentially a complex, hard and time consuming. Organisations now –a-days are concerned about vast qualities of data. The data quality is concerned with technical issue...
متن کاملIdentification of Categorical Registration Data of Domain Names in Data Warehouse Construction Task
This work is dedicated to formation of data warehouse for processing of a large volume of registration data of domain names. Data cleaning is applied in order to increase the effectiveness of decision making support. Data cleaning is applied in warehouses for detection and deletion of errors, discrepancy in data in order to improve their quality. For this purpose, fuzzy record comparison algori...
متن کامل